Search CORE

12 research outputs found

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Author: Kiyohara Haruka
Narita Yusuke
Saito Yuta
Tateno Kei
Udagawa Takuma
Publication venue
Publication date: 25/11/2022
Field of study

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level. Thus, the data-driven estimator selection problem is becoming increasingly important and can have a significant impact on the accuracy of OPE. However, identifying the most accurate estimator using only the logged data is quite challenging because the ground-truth estimation accuracy of estimators is generally unavailable. This paper studies this challenging problem of estimator selection for OPE for the first time. In particular, we enable an estimator selection that is adaptive to a given OPE task, by appropriately subsampling available logged data and constructing pseudo policies useful for the underlying estimator selection task. Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic.Comment: accepted at AAAI'2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Author: Kiyohara Haruka
Narita Yusuke
Saito Yuta
Shimizu Nobuyuki
Uehara Masatoshi
Yamamoto Yasuo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/06/2023
Field of study

Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the mean squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.Comment: KDD2023 Research trac

arXiv.org e-Print Archive

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Author: Bennett Andrew
Chernozhukov Victor
Jiang Nan
Kallus Nathan
Kiyohara Haruka
Shi Chengchun
Sun Wen
Uehara Masatoshi
Publication venue
Publication date: 14/11/2023
Field of study

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.Comment: This paper was accepted in NeurIPS 202

arXiv.org e-Print Archive

版築と鉄骨による合成構造の開発 (特集自然素材をまとった建物づくり)

Author: Chizuru Kiyohara
Haruka SUGIYAMA
Kei-ichi IMAMOTO
今本啓一
杉山晴香
清原千鶴
Publication venue: 東京理科大学
Publication date: 01/04/2019
Field of study

Tokyo University of Science Repository for Academic Resources / 東京理科大学学術リポジトリ

Effectiveness of a digital device providing real-time visualized tooth brushing instructions: A randomized controlled trial

Author: Iwami Taku
Kawamura Takashi
Kiyohara Kosuke
Konda Manako
Nishioka Norihiro
Nishiura Masahiro
Okabayashi Satoe
Okazawa Yui
Shida Haruka
Takase Naoko
Yoshioka Masami
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Introduction: The aim of this trial was to investigate whether a digital device that provides real-time visualized brushing instructions would contribute to the removal of dental plaque over usual brushing instructions. Methods: We conducted a single-center, parallel-group, stratified permuted block randomized control trial with 1:1 allocation ratio. Eligibility criteria included people aged ≥ 18 years, and exclude people who met the following criteria: severely crowded teeth; using interdental cleaning implement; having external injury in the oral cavity, or stomatitis; having less than 20 teeth; using orthodontic apparatus; visited to a dental clinic; having the possibility of consulting a dental clinic; having a dental license; not owning a smartphone or tablet device; smoker; taken antibiotics; pregnant; an allergy to the staining fluid; and employee of Sunstar Inc. All participants received tooth brushing instructions using video materials and were randomly assigned to one of two groups for four weeks: (1) an intervention group who used the digital device, providing real-time visualized instructions by connection with a mobile application; and (2) a control group that used a digital device which only collected their brushing logs. The primary outcome was the change in 6-point method plaque control record (PCR) score of all teeth between baseline and week 4. The t-test was used to compare the two groups in accordance with intention-to-treat principles. Results: Among 118 enrolled individuals, 112 participants were eligible for our analyses. The mean of PCR score at week 4 was 45.05% in the intervention group and 49.65% in the control group, and the change of PCR score from baseline was −20.46% in the intervention group and −15.77% in the control group (p = 0.088, 95% confidence interval −0.70–10.07). Conclusions: A digital device providing real-time visualized brushing instructions may be effective for the removal of dental plaque

Directory of Open Access Journals

Kyoto University Research Information Repository

Prehospital cardiopulmonary resuscitation duration and neurological outcome after out-of-hospital cardiac arrest among children by location of arrest: a Nationwide cohort study

Author: Iwami Taku
Kawamura Takashi
Kiguchi Takeyuki
Kishimori Takefumi
Kitamura Tetsuhisa
Kiyohara Kosuke
Kobayashi Daisuke
Matsuyama Tasuku
Nishiyama Chika
Okabayashi Satoe
Shida Haruka
Shimamoto Tomonari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2019
Field of study

Background: Little is known about the associations between the duration of prehospital cardiopulmonary resuscitation (CPR) by emergency medical services (EMS) and outcomes among paediatric patients with out-of-hospital cardiac arrests (OHCAs). We investigated these associations and the optimal prehospital EMS CPR duration by the location of arrests. Methods: We included paediatric patients aged 0–17 years with OHCAs before EMS arrival who were transported to medical institutions after resuscitation by bystanders or EMS personnel. We excluded paediatric OHCA patients for whom CPR was not performed, who had cardiac arrest after EMS arrival, whose EMS CPR duration were 30 min) in both groups (1.4% [6/417] in residential locations and 0.6% [1/170] in public locations). Conclusions: A longer prehospital EMS CPR duration is independently associated with a lower proportion of patients with a favourable neurological outcome. The association between prehospital EMS CPR duration and neurological outcome differed significantly by location of arrests

Kyoto University Research Information Repository

Osaka University Knowledge Archive

Institutional Repositories DataBase (IRDB)